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Abstract 

We consider probabilistic automata on a general state space and study their com- 
putational power. The model is based on the concept of language recognition by 
probabilistic automata due to Rabin 12] and models of analog computation in a noisy 
environment suggested by Maass and Orponen [?.;, and Maass and Sontag [S]. Our 
main result is a generalization of Rabin's reduction theorem that implies that under 
very mild conditions, the computational power of the automaton is limited to regular 
languages. 

Keywords: probabilistic automata, probabilistic computation, noisy computational sys- 
tems, regular languages, definite languages. 

1 Introduction 

Probabilistic automata have been studied since the early 60's ^JJ. Relevant to our line of 
interest is the work of Rabin where probabilistic (finite) automata with isolated cut-point 
were introduced. He showed that such automata recognize regular languages, and identified 
a condition which restricts them to definite languages (languages for which there exists an 
integer r such that any two words coinciding on the last r symbols are both or neither in 
the language). 

Paz generalized Rabin's condition for definite languages and called it weak ergodicity. He 
showed that Rabin's stability theorem holds for weakly ergodic systems as well [HH ITT] . 

In recent years there is much interest in analog automata and their computational prop- 
erties. A model of analog computation in a noisy environment was introduced by Maass and 
Orponen in JIJ. For a specific type of noise it recognizes only regular languages (see also 2\). 
Analog neural networks with Gaussian-like noise were shown by Maass and Sontag |Hj to be 
limited in their language-recognition power to definite languages. This is in sharp contrast 
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with the noise-free case where analog computational models are capable of simulating Turing 
machines, and when containing real constants, can recognize non-recursive languages |T5] . 

In this work we propose a model which includes the discrete model of Rabin and the 
analog models suggested in |H] , and find general conditions (related to ergodic properties 
of stochastic kernels representing probabilistic transitions of the automaton) that restrict its 
computational power to regular and definite languages. 

We denote the state space of the automaton by Q and the alphabet by E. As usual, the 
set of all words of length r is denoted by E r and E* := U rg NE r . We assume that Q is a Polish 
space and denote by B the a-algebra of its Borel subsets. 

Let £ be the Banach space of signed measures on (O, B) with the total variation norm 

:= sup fi(A) — inf fi(A), 

and let C be the space of bounded linear operators in £ with the norm ||P||i = sup ||P//||i. 

IHIi =1 

Definition 1.1. An operator P G C is said to be a Markov operator if for any probability 
measure [i, the image P\x is again a probability measure. A Markov system is a set of Markov 
operators T = {P u : u G E}. 

With any Markov system T, one can associate a probabilistic computational system as 
follows. At each computation step the system receives an input signal u G E and updates its 
state. If the probability distribution on the initial states is given by the probability measure 
/i , then the distribution of states after n+1 computational steps on inputs w = w , Wi, w n , 
is defined by 

Pwl^O ^ti) n • . . . ■ P wl P wo fXg. 

If the probability of moving from state x G f2 to set A G B upon receiving input u G E is 
given by a stochastic kernel P u (x, A), then P u fi(A) = j n P u (x, A)fj,(dx). 
Let A and 1Z be two subsets of V with the property of having a p-gap 

dist(A,TZ)= inf ||/i-i/||i = p>0 (1.2) 

A Markov computational system becomes a language recognition device by agreement that 
an input string is accepted or rejected according to whether the distribution of states of the 
MCS after reading the string is in A or in 1Z. 
Finally, we have the definition: 

Definition 1.3. Let fi be an initial distribution and A and 1Z be two bounded subsets of £ 
that satisfy (ji.i^ . Let T = {P u : u G E} be a set of Markov operators on £ . We say that the 
Markov computational system (MCS) M. = (£, A,1Z, T,,fiQ,T) recognizes the subset L C E* 
if for all w G E* : 

w G L <^> P w fio ^ A 
w L <^> P w [i G TZ. 



2 



We recall that two words u, v G E* are equivalent with respect to L if and only if 
uw 6 vw G L for all w G E*. A language L C £* is regular if there are finitely many 
equivalence classes. L is definite if for some r > 0, wu 6 L m 6 L for all w 6 E* and 
■u G £ r . If £ is finite, then definite languages are regular. 

A quasi-compact MCS can be characterized as a system such that £ is finite and there 
is a set of compact operators {Q w G C : w G £*} such that lim| TO i_ i . 0O \\P W — Q w \\x = 0. 
Section |21 is devoted to MCS having this property. Our main result (Theorem ^) states that 
quasi-compact MCS can recognize regular languages only. As a consequence of this result, 
we obtain the following theorem which shows that "any reasonable" probabilistic automata 
recognize regular languages only: 

Theorem. Let A4 be an MCS. Assume that £ is finite, and there exist constant K > and 
probability measure p such that P u {x, A) < Kp(A) for all u G £, x G Q, A G B. Then, if a 
language L C £* is recognized by M., it is a regular language. 

A MCS is weakly ergodic if there is a set of constant operators {H w G C : w G £*} such 
that lim^i^cc \\P W — H w \\i = 0. In Section El we carry over the theory of discrete weakly 
ergodic systems developed by Paz [TU1 HI] to our general setup. In particular, if a language 
L is recognized by a weakly ergodic MCS, then it is definite language. 



2 The Reduction Lemma and Quasi-compact MCS 

We prove here a general version of Rabin's reduction theorem (Lemma I2.2|) which makes 
the connection between a measure of non-compactness of the set {P w fio : w G £*} with 
the computational power of MCS. Then we introduce the notion of quasi-compact MCS and 
show that these systems satisfy the conditions stated in Lemma \2.2l 

If S is a bounded subset of a Banach space E, Kuratowski's measure of non-compactness 
a (S) of S is defined by [T] 

a (S) = mi{e > : S can be covered by a finite number of sets 

of diameter smaller than e}. (2-1) 

A bounded set S is totally bounded if a(S) = 0. 

Lemma 2.2. Let A4 be an MCS, and assume that a(0) < p, where O = {P w po '■ w G £*} is 
the set of all possible state distributions ofAi, and p is defined by (jl.2|) . Then, if a language 
L C S* is recognized by M., it is a regular language. 

Proof. If \\P u po — PvPo\\i < Pi then u and v are in the same equivalence class. Indeed, for 
any w G E*, 

ll-PwwAk) — -P^to/^olli — \\Pw (Pupo — Pvpo) ||i < II-PmMo — -P^/^olli < P- 

There is at most a finite number of equivalence classes, since there is a finite covering of O 
by sets with diameter less than p. □ 

Lemma \2. 21 is a natural generalization of Rabin's reduction theorem [T21, where the state 
space Q is finite, and hence the whole space of probability measures is compact. 
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Example 2.3. Consider an MCS Ai such that Vt = N and Y is a finite set. If the sums 
HjP u (i,j) converges uniformly for each m 6 E, then the corresponding operators P u G £ are 
compact J3J/ ; and consequently (since O C U u£ y;P u V) Ai recognizes regular languages only. 

Recall that a Markov operator P is called quasi-compact if there is a compact operator 
Q G £ such that \\P - Q\\ x < 1 0. 

Definition 2.4. An MCS Ai is called quasi-compact if the alphabet S is finite, and there 
exist constants r, 5 > such that for any w G S r i/iere zs a compact operator Q w which 
satisfies \\P W — Q w \\i < 1 — 5. 

If an MCS Ai is quasi-compact, then there exists a constant M > and a collection of 
compact operators {Q w : it? G S*} such that \\P W - Q w \\i < M(l - 5) |w|/r , for all w G £*. 
The next theorem characterizes the computational power of quasi-compact MCS. 

Theorem 1. If Ai is a quasi-compact MCS, and a language L C S* is recognized by Ai, 
then it is a regular language. 

Proof. Fix any e > 0. There exist a number n G N and compact operators Q w , w G S n 
such that ||P TO — Q w ||i < £ for all it? G S n . For any words v G S* and w G S ra , we have 
H-Puw/^o — Qxo(-fwA*o) ||i ^ — < £• Since Q w (P v fio) is an element of the totally 

bounded set Q w (V), then the last inequality implies that the set O = {P u [Iq : u G S*} can 
be covered by a finite number of balls of radius arbitrarily close to e. □ 

Doeblin's condition which follows, is a criterion for quasi-compactness (it should not be 
confused with its stronger version, defined in Sectional which was used in |8J). 

Definition 2.5. Let P(x, A) be a stochastic kernel defined on (Q, B). We say that it satisfies 
Condition D if there exist 6 > 0, r\ < 1 and a probability measure n on (ft, B) such that 

fi{A) > 9 =>> P(x, A)>T) for all x G Q. 

Example 2.6. Condition D holds if P(x, A) < Kfi(A) for some K > and probability 
measure \i G £ (e.g., P(x,A) = J A p(x,y)^i(dy) and \p(x,y)\ < K). 

Theorem 2. Let M. be an MCS. If E is finite and for some n G N, all stochastic kernels 
P w (x,A), w G S n , satisfy Condition D, then Ai is quasi- compact. 

The proof, given in Appendix El follows the proof in that Condition D implies 
quasi-compactness for an individual Markov operator. 

The following lemma, whose proof is deferred to Appendix [0 gives a complete charac- 
terization of a quasi-compact MCS in terms of its associated Markov operators. 

Lemma 2.7. If an MCS Ai is quasi- compact, then a(T*) = 0, where T* = {P w : w G £*}. 

It is easy to see that a{0) < sup ug2 a(P u V) + a(T), where T = {P u : u G £}. This 
yields a criterion for quasi-compactness in terms of the associated Markov system T and also 
suggests generalizations to infinite alphabets, e.g. in the case if E is a compact set and the 
map P(u) = P u : E — > £ is continuous. 
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3 Weakly Ergodic MCS 



For any Markov operator P define 

5{P) := sup h\PfJL - Pv\\i = supsup \P(x,A) - P(y,A)\. 

fj,,ueV 2 x ,y AeB 

Then (we refer to [SHE] for the properties of Dobrushin's coefficient S(P)): 

S(P)= ™p Jlj^li, (3.1) 



\eAf\{o} 

where TV = {A G S : A(fi) = 0}. 

Definition 3.2. ^4 Markov system {P u , u <E T,} is called weakly ergodic if there exist 
constants r, 5 > such that S(P W ) < 1 — 5 for any w G S r . v4n MCS" .M is called weakly 
ergodic if its associated Markov system {P u , u G S} is weakly ergodic. 

It follows from the definition and ((S3]) that <5(P W ) < M(l - 5)^ l/r ", for any it; G S* and 
some M > 0. Maass and Sontag used a strong Doeblin's condition to prove the computational 
power of noisy neural networks [Bj. They essentially proved (see also fUEj) the following 
result: 

Theorem 3. Let M. be a weakly ergodic MCS. If a language L can be recognized by M., then 
it is definite. 

Definition 3.3. A Markov operator P satisfies Condition D if P(x, •) > ap{-) for some 
constant c G (0, 1) and a probability measure (p G V. 

If a Markov operator P satisfies Condition D with a constant c, then 5{P) < 1 — c 
The following example shows that this condition is not necessary. 

Example 3.4. Let Q = {1,2,3} and P(x,y) — \ if x ^ y. Then 8{P) = \, but P does not 
satisfy condition D . 

We next state a general version of the Rabin- Paz stability theorem ^TJ ^]. We first 
define two MCS, M. and M. to be similar \i they share the same measurable space (Q,B), 
alphabet S, and sets A and TZ, and differ only in their Markov operators. 

Theorem 4. Let Ai and A4 be two similar MCS such that the first is weakly ergodic. Then 
there is a > 0, such that if \\P U — P u \\i < ot for all u G S ; then the second is also weakly 
ergodic. Moreover, the two MCS recognize the same language. 

For the sake of completeness we give a proof in Appendix O 
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Appendices 

A Proof of Theorem [21 

Lemma A.l. \14\l Let K(x,A) and N(x,A) be two stochastic kernels defined by 

K(x,A)= / k(x,y)fi{dx), \k(x,y)\ < C K , 



A 



N(x,A)= / n{x,y)jji{dx), \n{x, y)\ < Cjv, 



A 

where k(x,y) and n(x,y) are measurable and bounded functions in Q x Q, and Ck,Cn are 
constants. Then NK G £ is compact. 

The proof in ^3] is for a special case, so we give here an alternative proof. 

Proof. Let {n m (x, y) : m G N} be a set of simple and measurable functions such that 

/ / \n m {x,y) - n(x,y)\fi(dx)fi{dy) < — , 

and define stochastic kernels N m (x,A) = J A n m (x,y)fi(dy). Since the corresponding opera- 
tors N m G £ have finite dimensional ranges they are compact. On the other hand 

WNK-NrnKW^ sup \\NKip-N m K<p\\ x <C K lm, 

llvll^ 1 

thus, NK = limm^oo N m K is a compact operator. □ 

Since operators P u , u G £ satisfy Condition D, they can be represented as P u = Q u + R u , 
where Q u is defined by a stochastic kernels having bounded and measurable on Q x Q 
densities q u (x,y) with respect to /x, and ||-R u ||i < 1 — r\ jEj. Consider the expansion of 
p W = UZoiQ^ + R Wk ), we S m+1 in 2 m+1 terms: 

m m / j — 1 m \ m 

p w = + e +- + 

k=0 j=0 \k=l k=j+l J k=0 

By Lemma IA.1[ the terms contains Q Wi at least twice as factor are all compact operators in 
£. Since there are at most m + 2 terms where Q Wi appear at most once, then we obtain that 
for any w G £ m+1 there is a compact operator Q w such that ||-P m — Qw\\i < (m+2)- (1 — ry)" 1 . 



B Proof of Lemma 12.71 

We need the following proposition suggested to us by Leonid Gurvits. 

Proposition B.l. Let Qi,Q2 G £ be two compact operators, and let H = {Pj} C £ be a 
bounded set of operators. Then, the set Q = {Q2PQ1 '■ P G H} is totally bounded. 
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Proof. Let /C = {/i G £ : < 1} and Xj C 5 : i = 1,2 be two compact sets such 

that Qi/C C Xi. Define a bounded family T = {fj} of continuous linear functions from X\ 
to X 2 by setting = Q 2 Pj- Since if is bounded, then T C C (Xi,X 2 ) is bounded and 
equicontinuous, that is by Ascoli's theorem it is conditionally compact. Fix any e > and 
consider a finite covering of JF by balls with radii e. If fi and /j- are included in the same 
ball, then 

\\Q2PiQ1 - QiPjQiWi < sup \\fi(x) - fjix)^ < 2e. 

Therefore < 2e. This completes the proof since e is arbitrary □ 

From Proposition IB. II it follows that the set {Q U PQ V : u, v G E n ,P G £, ||-P||i = 1} is 
totally bounded. 

Fix any e > 0. There exist a number n G N and compact operators w G S n such 
that H-Pto — <5to||i < e for all wGH" Since any word w G £- 2n+1 can be represented in the 
form w = uwv, where u, v G S n , and 

|| Pu> QuPiI>Qit||l II PyPviPu QvP'wQu || 1 ^ 

— ||-f\i-ftu-fu PiiP«jQu||l "I - || PyPwQu QvPwQu\\l — 

< ll-fu — + \\Pv — Qv\\i < 2e, 

we can conclude that a(T^ 2n+1 ) < 2e, where T^ 2n+1 = {P m : w G E^ 2n+1 }. It follows that 
a(T*) = a(T- 2n+l ) < 2s, completing the proof since e > is arbitrary. 

C Proof of Theorem HI 

This result is implied by the following lemma: 

Lemma C.l. Let M. and A4 be two similar MCS, such that the first is weakly ergodic and 
the second is arbitrary. Then, for any (3 > there exists e > such that \\P U — P u \\i < s for 
all u G £ implies \\P W — P w \\i < (3 for all words weE*. 

Proof. It is easy verify by using the representation 1)3.1)) that: 

(i) For any Markov operators P, Q, and R, we have \\PQ — PR\\i < 5(P)\\Q — R\\x- 

(ii) For any Markov operators P, P we have 5(P) < S(P) + ||P — P||i- 

Let r G N be such that 5(P W ) < (3/7 for any w G S r , and let e = (3/r. If ||P U - P u \\i < e for 
any hGE, then \\P W — P w \\i < ne for any w G S n . It follows that ||P W — P w ||i < (3 for any 
w G S- r . Moreover, for any v G S r and w 6 E*, we have 

1 1 Pfu? Pvw ||l ^ || Pvw Pf ||l || Pi! Pi? ||l ~\~ || Pu Pyw ||l ^ 

< 25(P V ) + HP, - + 25(P V ) < A5(P V ) + 3\\P V - Pji < /3, 
completing the proof. □ 
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